Decomposition of the mean absolute error (MAE) into systematic and unsystematic components

When evaluating the performance of quantitative models, dimensioned errors often are characterized by sums-of-squares measures such as the mean squared error (MSE) or its square root, the root mean squared error (RMSE). In terms of quantifying average error, however, absolute-value-based measures such as the mean absolute error (MAE) are more interpretable than MSE or RMSE. Part of that historical preference for sums-of-squares measures is that they are mathematically amenable to decomposition and one can then form ratios, such as those based on separating MSE into its systematic and unsystematic components. Here, we develop and illustrate a decomposition of MAE into three useful submeasures: (1) bias error, (2) proportionality error, and (3) unsystematic error. This three-part decomposition of MAE is preferable to comparable decompositions of MSE because it provides more straightforward information on the nature of the model-error distribution. We illustrate the properties of our new three-part decomposition using a long-term reconstruction of streamflow for the Upper Colorado River.


Unfunded studies
Enter: The author(s) received no specific funding for this work. The authors have declared that no competing interests exist. NO  given a set of model predictions (P i , i = 1, 2, ..., n), where each P i corresponds to a 6 reliable observation (O i ), the mean squared error (MSE) and the root mean squared 7 error (RMSE): are routinely reported [6]. The mean absolute error (MAE): June 20, 2022 1/7 is reported less often, even though it is a more meaningful indicator of average error [7]. 11 We and others have shown elsewhere that error statistics based on sums-of-squares 12 have a number of issues that make them less interpretable than those based on absolute 13 such that total error (as estimated by MSE) that is identified as systematic or unsystematic. This 31 decomposition has served as a relatively insightful summary of model error (e.g., [13]) 32 and has been used as a guide to model improvement because a model that has a large 33 amount of systematic error usually can be respecified to reduce the consistent over-or 34 under-prediction.

35
While decomposing MSE into its constituent components has been a useful approach, 36 MSE is a flawed measure of average model error. Using MSE to identify systematic and 37 unsystmatic components of error, therefore, can produce misleading summaries of the 38 types of errors that various models contain. Even more importantly, models may be 39 inappropriately adjusted to reduce systematic error that has been misidentified by the 40 MSE-based approach (e.g., when the impacts of outliers are overemphasized). As a 41 result, our goal here is to develop and present a more rational approach for error 42 decomposition that uses MAE as the baseline for average model error.

44
Although our goal is to partition MAE into components that represent systematic and 45 unsystematic error patterns, we also want to move beyond the traditional two-part 46 decomposition to further divide systematic errors into two separate components. One For each of these three types of error -bias, proportionality, and unsystematic -53 we develop a weighting function that can be used to partition MAE into its three 54 components. We offer a diagram (Fig. 1) using a small, synthetic dataset to illustrate 55 the estimation of the three components.

57
Here, we define bias as the component of systematic error that is contained in the over-58 or under-prediction of the observed mean. This is often referred to as the mean bias 59 error (MBE): In addition to indicating average over-or under-prediction, MBE can be used to develop 61 a corresponding (to P i ) set of unbiased predicted values: The magnitude (absolute value) of MBE can additionally serve as the weight that 63 determines the relative importance of bias to the overall MAE: The magnitude of bias for our example dataset can be seen in the bottom left panel in 65 proportionality error, we use the unbiased predicted regression values: Given that the OLS solution forP is constrained to pass through (Ō,P ),P ′ passes 79 through (Ō,Ō) and, therefore, is unbiased. Weights for the relative importance of 80 proportionality error (for each O i ) are determined using the difference between the 81 unbiased predictions and the observations (the red lines in Fig. 1): June 20, 2022 3/7 Once again, if OLS regression is used forP ′ i , then the biased predictions and regression 89 values produce the same weights:

91
The three weights for bias, proportionality, and unsystematic error developed above now 92 can be used to scale the individual components of absolute error: A clear advantage of this weight-based decomposition of average error is that it uses 96 MAE rather than MSE as the baseline. Another advantage is that predictions that have 97 no error do not contribute to the components. This was not the case with the

102
It is possible for the denominator within these summations (i.e., b + s i + u i ) to be 103 zero, but that can only occur when a model has no bias and the regression line passes 104 through a predicted value that has no error (i.e., when b = 0 and P i =P i = O i ). If that 105 rare model-prediction event occurs, those elements with b + p i + u i = 0 can simply be 106 excluded from the summation.

107
Using these definitions, MAE b , MAE p , and MAE u are conservative and must sum to 108 MAE: As with MSE s and MSE u , it is instructive to form ratios (i.e., MAE b /MAE, water-year totals for a large river and, therefore, are reported in billions of cubic meters 120 per year. Both the observed data and reconstructed values are based on estimates of 121 "naturalized" streamflow, which corrects for the anthropogenic alterations of flow (i.e., 122 reservoirs, irrigation, etc.). In a recent article [13], the model was bias-corrected so that 123 its empirical probability distribution better matched that of the observations (e.g.,

124
compare Fig. 2a and 2b). Here, we employ our new decomposition of model errors to Model-estimation errors for (a) the reconstruction of Upper Colorado River flow from [14] and (b) the same reconstruction after applying the bias-correction procedure of [13].
Prior to bias correction (Fig. 2a), the Upper Colorado River reconstruction has low 127 overall error, with a MAE of 2.12 billion m 3 (i.e., when compared to theŌ of 18.53 128 billion m 3 ). The small value of MAE b (0.08 billion m 3 ) also shows that the 129 reconstruction model faithfully reproduces the observed mean. From the scatterplot and 130 the substantial amount (34%) of error in MAE p , however, it is clear that high flow 131 years are underestimated (and, to a lesser extent, low flows are overestimated). Even 132 with these substantial proportionality errors, the majority (62%) of the mean absolute 133 error is in MAE u ), which is desirable (i.e., the majority of error is random or 134 unsystematic). At the same time, the traditional decomposition into MSE s and MSE u 135 masks the distinction between bias and proportionality error while also providing an 136 underestimate of these combined systematic errors because it is inflating the "average" 137 unsystematic error (MSE u ) by squaring the model-predicted deviations from the  Bias correction (Fig. 2b)  MAE. From the slope of the regression line, however, it is clear that there still is some 144 proportionality error that the bias-correction procedure has not entirely removed. The 145 MSE-based measures present a rosier picture of the reduction of systematic error, again 146 due to the inflation of the unsystematic error produced by the squaring of the 147 deviations around the regression line. Overall, the MAE-based approach shows that 148 there is still room for additional improvement in the original reconstruction (Fig. 2a) 149 and in the bias correction procedure (Fig. 2b) than is evident in the MSE-based 150 measures. In particular, the additional systematic component introduced here, MAE p , 151 suggests that high flows still need to be adjusted upward. intrepretable standard for evaluating model errors while also pointing to more specific 164 types of error that may be reduced.